AITopics | double dixie cup problem

Collaborating Authors

double dixie cup problem

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Query K-means Clustering and the Double Dixie Cup Problem

Neural Information Processing SystemsNov-20-2025, 21:42:30 GMT

We consider the problem of approximate $K$-means clustering with outliers and side information provided by same-cluster queries and possibly noisy answers. Our solution shows that, under some mild assumptions on the smallest cluster size, one can obtain an $(1+\epsilon)$-approximation for the optimal potential with probability at least $1-\delta$, where $\epsilon> 0$ and $\delta\in(0,1)$, using an expected number of $O(\frac{K^3}{\epsilon \delta})$ noiseless same-cluster queries and comparison-based clustering of complexity $O(ndK + \frac{K^3}{\epsilon \delta})$; here, $n$ denotes the number of points and $d$ the dimension of space. Compared to a handful of other known approaches that perform importance sampling to account for small cluster sizes, the proposed query technique reduces the number of queries by a factor of roughly $O(\frac{K^6}{\epsilon^3})$, at the cost of possibly missing very small clusters. We extend this settings to the case where some queries to the oracle produce erroneous information, and where certain points, termed outliers, do not belong to any clusters. Our proof techniques differ from previous methods used for $K$-means clustering analysis, as they rely on estimating the sizes of the clusters and the number of points needed for accurate centroid estimation and subsequent nontrivial generalizations of the double Dixie cup problem. We illustrate the performance of the proposed algorithm both on synthetic and real datasets, including MNIST and CIFAR $10$.

double dixie cup problem, name change, query k-means clustering, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

Reviews: Query K-means Clustering and the Double Dixie Cup Problem

Neural Information Processing SystemsOct-7-2024, 04:15:07 GMT

This paper investigates the problem of active-semi-supervised clustering, by considering both noiseless (perfect oracle) and noisy (imperfect oracle) query responses. The authors provide probabilistic guarantees for low approximation errors to the true optimal k-means objective. The corresponding query complexities are substantially lower than in the existing literature. Importantly, as noted by the authors, their query complexity is independent of the size of the dataset. The main strength of the paper lies in the considerable technical rigour with which the subject has been handled.

assumption, k-means solution, query, (12 more...)

Neural Information Processing Systems

Genre: Research Report (0.58)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.51)

Add feedback

Query K-means Clustering and the Double Dixie Cup Problem

Chien, I, Pan, Chao, Milenkovic, Olgica

Neural Information Processing SystemsFeb-14-2020, 18:43:57 GMT

We consider the problem of approximate $K$-means clustering with outliers and side information provided by same-cluster queries and possibly noisy answers. Our solution shows that, under some mild assumptions on the smallest cluster size, one can obtain an $(1 \epsilon)$-approximation for the optimal potential with probability at least $1-\delta$, where $\epsilon 0$ and $\delta\in(0,1)$, using an expected number of $O(\frac{K 3}{\epsilon \delta})$ noiseless same-cluster queries and comparison-based clustering of complexity $O(ndK \frac{K 3}{\epsilon \delta})$; here, $n$ denotes the number of points and $d$ the dimension of space. Compared to a handful of other known approaches that perform importance sampling to account for small cluster sizes, the proposed query technique reduces the number of queries by a factor of roughly $O(\frac{K 6}{\epsilon 3})$, at the cost of possibly missing very small clusters. We extend this settings to the case where some queries to the oracle produce erroneous information, and where certain points, termed outliers, do not belong to any clusters. Our proof techniques differ from previous methods used for $K$-means clustering analysis, as they rely on estimating the sizes of the clusters and the number of points needed for accurate centroid estimation and subsequent nontrivial generalizations of the double Dixie cup problem.

delta, double dixie cup problem, query k-means clustering, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Clustering (0.40)

Add feedback